LRTDP Versus UCT for Online Probabilistic Planning
نویسندگان
چکیده
UCT, the premier method for solving games such as Go, is also becoming the dominant algorithm for probabilistic planning. Out of the five solvers at the International Probabilistic Planning Competition (IPPC) 2011, four were based on the UCT algorithm. However, while a UCT-based planner, PROST, won the contest, an LRTDP-based system, GLUTTON, came in a close second, outperforming other systems derived from UCT. These results raise a question: what are the strengths and weaknesses of LRTDP and UCT in practice? This paper starts answering this question by contrasting the two approaches in the context of finite-horizon MDPs. We demonstrate that in such scenarios, UCT’s lack of a sound termination condition is a serious practical disadvantage. In order to handle an MDP with a large finite horizon under a time constraint, UCT forces an expert to guess a non-myopic lookahead value for which it should be able to converge on the encountered states. Mistakes in setting this parameter can greatly hurt UCT’s performance. In contrast, LRTDP’s convergence criterion allows for an iterative deepening strategy. Using this strategy, LRTDP automatically finds the largest lookahead value feasible under the given time constraint. As a result, LRTDP has better performance and stronger theoretical properties. We present an online version of GLUTTON, named GOURMAND, that illustrates this analysis and outperforms PROST on the set of IPPC-2011 problems.
منابع مشابه
PROST: Probabilistic Planning Based on UCT
We present PROST, a probabilistic planning system that is based on the UCT algorithm by Kocsis and Szepesvári (2006), which has been applied successfully to many areas of planning and acting under uncertainty. The objective of this paper is to show the application of UCT to domainindependent probabilistic planning, an area it had not been applied to before. We furthermore present several enhanc...
متن کاملSolving Uncertain MDPs by Reusing State Information and Plans
While MDPs are powerful tools for modeling sequential decision making problems under uncertainty, they are sensitive to the accuracy of their parameters. MDPs with uncertainty in their parameters are called Uncertain MDPs. In this paper, we introduce a general framework that allows off-theshelf MDP algorithms to solve Uncertain MDPs by planning based on currently available information and repla...
متن کاملProbabilistic Temporal Planning
Planning research has explored the issues that arise when planning with concurrent and durative actions. Separately, planners that can cope with probabilistic effects have also been created. However, few attempts have been made to combine both probabilistic effects and concurrent durative actions into a single planner. The principal one of which we are aware was targeted at a specific domain. W...
متن کاملReverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors
In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous state-of-the-art planners (e.g., those from IPPC-2008), which were primari...
متن کاملASAP-UCT: Abstraction of State-Action Pairs in UCT
Monte-Carlo Tree Search (MCTS) algorithms such as UCT are an attractive online framework for solving planning under uncertainty problems modeled as a Markov Decision Process. However, MCTS search trees are constructed in flat state and action spaces, which can lead to poor policies for large problems. In a separate research thread, domain abstraction techniques compute symmetries to reduce the ...
متن کامل